In this notebook we will present our solution for the Cyclades board game tracking. We will show how we implemented the game logic and how we tracked objects on the board, as well as events.
We decided to record movies with 3 levels of difficulty. It turned out, that even the first one is very hard corresponding to the level 3 in the project description. There we have an angled view of the board, slightly shaking camera view and sometimes elements are covered by hand. The second level is harder, as it introduces slightly different angle and different light conditions (random shadows). The third one is almost impossible to process, because we have there a dynamic camera view with a acute angle and random shadows. We also have a lot of objects on the board, which makes it even harder to track.
We decided to track several objects:
We also track several events:
As a gameplay status, and score, we track:
import cv2
import numpy as np
import PIL
def imshow(a):
a = a.clip(0, 255).astype('uint8')
if a.ndim == 3:
if a.shape[2] == 4:
a = cv2.cvtColor(a, cv2.COLOR_BGRA2RGBA)
else:
a = cv2.cvtColor(a, cv2.COLOR_BGR2RGB)
display(PIL.Image.fromarray(a))
def read_and_concatenate(img_paths, resize=0.4):
images = []
for img_path in img_paths:
images.append(cv2.resize(cv2.imread(f"report_data/{img_path}"), None,fx=resize, fy=resize))
imshow(np.concatenate(images, axis=1))
As stated above, first level has following conditions:
Example of the frames from the first level of difficulty:
read_and_concatenate([f"level1_{x}.png" for x in range(1,4)])
Here we introduced random shadows and different light conditions
read_and_concatenate([f"level2_{x}.png" for x in range(3)])
Here we introduced dynamic camera view, and random shadows. We also have a lot of objects on the board, which makes it even harder to track.
read_and_concatenate([f"level3_{x}.png" for x in range(1,4)])
Before we've had applied any tracking technique we processed the video.
At first we perform CLAHE equalization on color image to fight with lightning conditions. board_preparator.equalize_color_image
read_and_concatenate([f"hist{x}.png" for x in range(2)], resize=0.8)
Then to fight with shaky camera we perform image aligment based on keypoints. We have one "ideal" board to which we map every processed frame board_preparator.alignImageToFirstFrame
Actually this is the most costly operation of all.
read_and_concatenate([f"warp{x}.jpg" for x in range(3)], resize=0.8)
As you can see image was rotated a little bit here so that keypoints positions corresponds to one another. Still when camera is shaky sometimes view is changed by a lot. To minimalize distortions we reinitialize reference frame sometimes board_preparator.reinitialize_first_frame
Next we want to filter the part left to the board that is not helping at all. We use HoughLines to do so. We always perform this operation on the same empty board reference image so this is deterministic. In reality one could ask user to manually label region to be filtered board_preparator.get_mask_of_left_mess. At every step we clean all the noise multiplying frame by this filtering mask
read_and_concatenate([f"left_line{x}.png" for x in range(4)], resize=1)
Next, we tried to separate the board into 2 parts.
To do so we applied averaging convolution on red and blue channel with kernel of size 10, then after simple thresholding we obtained a mask. Next we took first vertical line from the left where sum of the mask was bigger than the threshold. Co0de of this function is in board_preparator.find_separating_line
read_and_concatenate([f"sep_line{x}.png" for x in range(3)], resize=1)
Since we really don't know apriori what we want to track (users can play with 4 different colors, we may have different amount of gods etc we can have dozens of items on the board at the same time, users can cover items with hand etc.) but we also wanted to create something working in real time. We used a heuristic. At every frame we perform foreground extraction. If some item has shown in the same place few times in a row, then we check if this is really an object. For most of the time board is very static so why should we use a lot of resources at every frame? We perform more costly operations only when necessary and we do not scan entire board seeking for dozen of items. utils.update_interesting_objects. Also when we detect an object we check at next iterations if object is still here. If object dissapered from one location and shown up at the other we track this as a move. We could detect object and run tracker but it was lost as pawns can move literlly everywhere on the board
read_and_concatenate([f"tracking{x}.png" for x in range(4)], resize=1)
Here, same object was seen three times in the same place, so one should check what it is. We analyze every 10th frame so for 1 second (first 3 frames) we didn't have to perform any operation connected with logic as nothing really happened on the board.
What if we have something on the board from the very beginning?. At the start we pass empty frame through background extractor 10 times. When first frame is different than empty board we see it and we can process every object.
read_and_concatenate([f"immadiate{x}.png" for x in range(2)], resize=0.8)
Unfortunatelly, this doesn't work that well when the lightning conditions change rapidly. We have only one video in excelent lightning condition when the board is entirely empty. In real life this shouldn't be such a big problem as every time we start from nothing on the board. Alternatively we could perform tracking for everything on the first frame but this would add plethora of code, so we stick with this approach.
read_and_concatenate([f"immadiate{x}.png" for x in range(2,4)], resize=0.8)
Now we analyze both parts separately. At the beginning we find circles and create sea/land map segmentation. Green circles are land and blue circles are sea. We couldn't detect all circles which makes analysis a little bit harder utils.find_circles right_part_analyzer.draw_circles right_part_analyzer.label_circles
We find circles using Hough Circles. After finding the cirles on the board we label them using heuristics based on color segmentation. We take colors for land, and water, and calculate the ratio to use them later to label the circles.
read_and_concatenate([f"circles{x}.png" for x in range(2)], resize=0.8)
Using masks above we can define and draw island elipsoids we can also detect who posses that island by checking whose warriars stand there. right_part_analyzer.detect_islands
imshow(cv2.imread("report_data/island.jpg"))
Everytime something new appears we detect object with heuristics. We tried different ones: template_matching - failed due to being rotation varian and keypoints - objects vere to small to detect keypoints at all. Eventually we stick with color filtering that actually works pretty well. Entire logic of objects recognition is in right/left_part_analyzer
read_and_concatenate([f"segmentation_right{x}.png" for x in range(1,7)], resize=0.8)
Here we see detectrion of black ship, red and yellow warrior. We applied this succesfully to other objects types like cities, god cards, pawns.
With the left part we use same techniques with one exception. Since we have a lot of noise there after god card was places we 0 the mask in the place where we detect objects. Where object dissaper from that place mask can be reset.
read_and_concatenate([f"mask_clean{x}.png" for x in range(0,4)], resize=0.8)
We count everything and display to the user.
imshow(cv2.imread("report_data/scores.png"))
We iterate over 10 frames, which gives 3FPS.
For each frame we do the same steps as above, we resize it and equalize color in the frame.
Then we try to align current frame to the first frame using previously calculated keypoints and descriptors. We use ORB detector and descriptor, and then we use Brute Force matcher to find the best matches. We find, and use homography to warp the perspective
Then, we separete the frame using previously calculated separating line. Then we use foreground extraction, which is further also divided to 2 parts.
Foreground extraction is used to see if some new region of interest emerged.
Now the most important part takes plase. We look and update the interesting objects for both left, and right side of the board, but separately.
Firstly we detect contours on previously shown foreground. Then we use a heuristic, that object must resist for 3 frames (1s) in the foreground to be classified. Then we have classify phase which just uses different color masks. Then we classify the object or set it to the unknown. We detect the warrior/ship type by discovering whether it's on an island, or on a sea using previous segmented board for islands. We filter the objects that are not unknown and have a suitable area. We mark the objects with rectangle with a text that has a proper color font.
If the detected counter is a warrior, we assign an island to this player (using colors). We check if the warrior's coordinated are within the island coordinates. Then, we mark the island using the warrior's color
We also detect if any counter is moved. When we detect each ship/warrior, we iterate over all previous counter of its type and check if the one is still there. If one isn't, that's a moved counter.
For other objects processing is very similar to what described here for ship/warrior
(python video player with results)
CLAHE and warping this approach would not work at all.